Metadata

Close
Metadata

%0 Conference Proceedings
%4 sid.inpe.br/sibgrapi/2021/09.07.06.25
%2 sid.inpe.br/sibgrapi/2021/09.07.06.25.04
%@doi 10.1109/SIBGRAPI54419.2021.00043
%T TVAnet: a spatial and feature-based attention model for self-driving car
%D 2021
%A Flores-Benites, Victor,
%A Mugruza-Vassallo, Carlos Andrés,
%A Mora-Colque, Rensso Victor Hugo,
%@affiliation Universidad Católica San Pablo
%@affiliation Universidad Nacional Tecnológica de Lima Sur
%@affiliation Universidad Católica San Pablo
%E Paiva, Afonso ,
%E Menotti, David ,
%E Baranoski, Gladimir V. G. ,
%E Proença, Hugo Pedro ,
%E Junior, Antonio Lopes Apolinario ,
%E Papa, João Paulo ,
%E Pagliosa, Paulo ,
%E dos Santos, Thiago Oliveira ,
%E e Sá, Asla Medeiros ,
%E da Silveira, Thiago Lopes Trugillo ,
%E Brazil, Emilio Vital ,
%E Ponti, Moacir A. ,
%E Fernandes, Leandro A. F. ,
%E Avila, Sandra,
%B Conference on Graphics, Patterns and Images, 34 (SIBGRAPI)
%C Gramado, RS, Brazil (virtual)
%8 18-22 Oct. 2021
%I IEEE Computer Society
%J Los Alamitos
%S Proceedings
%K visual attention, self-driving, spatial attention, feature-based attention.
%X End-to-end methods facilitate the development of self-driving models by employing a single network that learns the human driving style from examples. However, these models face problems of distributional shift problem, causal confusion, and high variance. To address these problems we propose two techniques. First, we propose the priority sampling algorithm, which biases the training sampling towards unknown observations for the model. Priority sampling employs a trade-off strategy that incentivizes the training algorithm to explore the whole dataset. Our results show uniform training on the dataset, as well as improved performance. As a second approach, we propose a model based on the theory of visual attention, called TVAnet, by which selecting relevant visual information to build an optimal environment representation. TVAnet employs two visual information selection mechanisms: spatial and feature-based attention. Spatial attention selects regions with visual encoding similar to contextual encoding, while feature-based attention selects features disentangled with useful information for routine driving. Furthermore, we encourage the model to recognize new sources of visual information by adding a bottom-up input. Results in the CoRL-2017 dataset show that our spatial attention mechanism recognizes regions relevant to the driving task. TVAnet builds disentangled features with low mutual dependence. Furthermore, our model is interpretable, facilitating the understanding of intelligent vehicle behavior. Finally, we report performance improvements over traditional end-to-end models.
%@language en
%3 109.pdf